Improving Grapheme Codebook Selection for Scribe Identification
نویسندگان
چکیده
In this paper we test several approaches to analysing grapheme codebook features for offline writer identification in medieval English scribal manuscripts. Current methods for selecting a codebook typically produce codebooks that perform no better than random grapheme selection, so our aim in this analysis is to identify potential methods of improving codebook selection. Three feature extraction methods are tested, and a number of feature selection methods are proposed and compared. Results show that PCA-based selection and a broad range of grapheme similarities perform best, while reducing computation time by a factor of four. All methods are compared on a modern dataset and a medieval dataset with very different characteristics; the results are robust to data variation.
منابع مشابه
A fast search method of speaker identification for large population using pre-selection and hierarchical matching
Performance of search during matching phase in a speaker identification system realized through vector quantization (VQ) is investigated in this paper. Voice of each person is recorded in a office room with personal computers. LPC−cepstrum is selected as feature vector. In order to gain higher success rate of identification, it is necessary to use larger size codebook for each person. Consequen...
متن کاملImproving automatic writer identification
State-of-the-art systems for automatic writer identification from handwritten text are based on two approaches: a statistical approach or a model-based approach. Both approaches have limitations. The main limitation of the statistical approach is that it relies on single-scale statistical features. The main limitation of the model-based approach is that the codebook generation is time-consuming...
متن کاملPredicting the scribe behind a page of medieval handwriting
This paper addresses the issue of attributing pieces of medieval handwriting to scribes known from other examples of writing. The system is applied to manuscript page images and performs extraction and comparison of letter shapes. Letters and sequences of connected letters are identified by means of connected component labeling. This is followed by further splitting into letter-size pieces. The...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملCategory-based phoneme-to-grapheme transliteration
Grapheme-based speech recognition systems are faster to develop but typically do not reach the same level of performance as phoneme-based systems. In this paper we introduce a technique for improving the performance of standard grapheme-based systems. We find that by handling a relatively small number of irregular words through phoneme-to-grapheme (P2G) transliteration – transforming the origin...
متن کامل